# High-Resolution Image Processing

Deepeyes 7B
Apache-2.0
DeepEyes is a vision-language model that encourages 'thinking with images' through reinforcement learning. It can directly integrate visual information into the reasoning chain and performs excellently in image-text processing tasks.
Text-to-Image Transformers English
D
ChenShawn
383
2
Upernet Swin Large
MIT
UPerNet semantic segmentation model based on Swin Transformer architecture, suitable for high-precision image segmentation tasks
Image Segmentation
U
smp-hub
110
0
Upernet Swin Small
MIT
UPerNet semantic segmentation model based on Swin Transformer small architecture, suitable for scene parsing tasks like ADE20K
Image Segmentation
U
smp-hub
100
0
Upernet Swin Tiny
MIT
UPerNet is a semantic segmentation model based on the ConvNeXt-Tiny architecture, suitable for image segmentation tasks.
Image Segmentation Safetensors
U
smp-hub
191
0
Segformer B5 Finetuned Coralscapes 1024 1024
Apache-2.0
SegFormer model optimized for coral reef semantic segmentation tasks, fine-tuned on the Coralscapes dataset at 1024x1024 resolution
Image Segmentation Transformers
S
EPFL-ECEO
821
0
Segformer B2 Finetuned Coralscapes 1024 1024
Apache-2.0
This is a semantic segmentation model based on the SegFormer architecture, specifically optimized for coral reef ecosystem image segmentation tasks and fine-tuned on the Coralscapes dataset.
Image Segmentation Transformers
S
EPFL-ECEO
139
0
Vit So400m Patch14 Siglip Gap 448.pali Mix
Apache-2.0
A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.
Text-to-Image Transformers
V
timm
15
0
Segformer B3 1024x1024 City 160k
Other
A semantic segmentation model based on the Segformer architecture, optimized for the Cityscapes dataset
Image Segmentation
S
smp-hub
14
0
Segformer B0 1024x1024 City 160k
Other
A lightweight semantic segmentation model based on Segformer architecture, pre-trained on the Cityscapes dataset
Image Segmentation Safetensors
S
smp-hub
269
1
Segformer B2 1024x1024 City 160k
Other
A semantic segmentation model based on the Segformer architecture, specifically optimized for the Cityscapes dataset
Image Segmentation
S
smp-hub
651
0
Segformer B1 512x512 Ade 160k
Other
PyTorch-based Segformer model for semantic segmentation tasks, pre-trained on the ADE20K dataset
Image Segmentation
S
smp-hub
20
0
Mplug Owl3 7B 241101
Apache-2.0
mPLUG-Owl3 is an advanced multimodal large language model that focuses on solving the problem of long image sequence understanding. It significantly improves the processing speed and sequence length support through the hyper attention mechanism.
Text-to-Image Safetensors English
M
mPLUG
302
10
Beit Base Patch16 384.in1k Ft Fungitastic 384
A Danish fungi classification model based on the BEiT architecture, specifically designed for identifying and classifying fungal species.
Image Classification PyTorch
B
BVRA
456
1
Llava Jp 1.3b V1.1
LLaVA-JP is a multimodal vision-language model that supports Japanese, capable of understanding and generating descriptions and dialogues about input images.
Image-to-Text Transformers Japanese
L
toshi456
90
11
Vitamin XL 256px
MIT
ViTamin-XL-256px is a vision-language model based on the ViTamin architecture, designed for efficient visual feature extraction and multimodal tasks, supporting high-resolution image processing.
Text-to-Image Transformers
V
jienengchen
655
1
Vitamin XL 384px
MIT
ViTamin-XL-384px is a large-scale vision-language model based on the ViTamin architecture, specifically designed for vision-language tasks, supporting high-resolution image processing and multimodal feature extraction.
Image-to-Text Transformers
V
jienengchen
104
20
Siglip So400m 14 980 Flash Attn2 Navit
Apache-2.0
SigLIP-based vision model that enhances maximum resolution to 980x980 through interpolated positional embeddings and implements NaViT strategy for variable resolution and aspect ratio-preserving image processing
Text-to-Image Transformers
S
HuggingFaceM4
4,153
46
Sdxl Instructpix2pix 768
An image editing model fine-tuned on Stable Diffusion XL (SDXL) using the InstructPix2Pix method, supporting image editing through natural language instructions.
Image Generation
S
diffusers
15.88k
50
Segformer B5 Finetuned Ade 640 640
Other
SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for image segmentation tasks.
Image Segmentation Transformers
S
nvidia
42.32k
39
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase